Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes.
نویسندگان
چکیده
Where did vertebrate genes come from? Here we address this question by analyzing eight completely sequenced land vertebrate genomes and six completely sequenced invertebrate genomes. Approximately 70% of the vertebrate genes can be found in the six invertebrate genomes with the standard homology search criteria (denoted as V.MCL), another approximately 6% can be found with relaxed search criteria, and an additional approximately 2% can be found in sequenced fungal and bacterial genomes. Thus, a substantial proportion of vertebrate genes (approximately 22%) cannot be found in the nonvertebrate genomes studied (denoted as Vonly). Interestingly, genes in Vonly are predominantly singletons, while the majority of genes in the other three groups belong to gene families. The proteins of Vonly tend to evolve faster than those of V.MCL. Surprisingly, in many cases the family sizes in V.MCL are only as large as or even smaller than their counterparts in the invertebrates, contrary to the general perception of a larger family size in vertebrates. Interestingly, in comparison with the family size in invertebrates, vertebrate gene families involved in regulation, signal transduction, transcription, protein transport, and protein modification tend to be expanded, whereas those involved in metabolic processes tend to be contracted. Furthermore, for almost all of the functional categories with family size expansion in vertebrates, the number of gene types (i.e., the number of singletons plus the number of gene families) tends to be over-represented in Vonly, but under-represented in V.MCL. Our study suggests that gene function is a major determinant of gene family size.
منابع مشابه
Predicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm
Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...
متن کاملEvolutionary Transition of Promoter and Gene Body DNA Methylation across Invertebrate–Vertebrate Boundary
Genomes of invertebrates and vertebrates exhibit highly divergent patterns of DNA methylation. Invertebrate genomes tend to be sparsely methylated, and DNA methylation is mostly targeted to a subset of transcription units (gene bodies). In a drastic contrast, vertebrate genomes are generally globally and heavily methylated, punctuated by the limited local hypo-methylation of putative regulatory...
متن کاملEvaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes
Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded DNA virus. There were two approaches for prediction of each Markov Model parameter,...
متن کاملComparative Genomic Study Reveals a Transition from TA Richness in Invertebrates to GC Richness in Vertebrates at CpG Flanking Sites: An Indication for Context-Dependent Mutagenicity of Methylated CpG Sites
Vertebrate genomes are characterized with CpG deficiency, particularly for GC-poor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide ...
متن کاملBroadening Gene Pool of Rice for Resistance to Biotic Stresses Through Wide Hybridization
Variability in the cultivated germplasm for economic traits such as resistance to rice tungro virus, sheathblight, yellow stem borer, drought and salt tolerance is limited. This necessitated search for the genes in secondary and tertiary gene pool of genus Oryza. Fortunately, wild species are an important reservoir ofuseful genes for resistance to major disease, pest and tolerance t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 18 2 شماره
صفحات -
تاریخ انتشار 2008